SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Lu Zhonghai);hsvcat:1"

Sökning: db:Swepub > Lu Zhonghai > Naturvetenskap

  • Resultat 1-10 av 66
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Liu, Ming, et al. (författare)
  • Hardware/Software co-design of a general-purpose computation platform in particle physics
  • 2007
  • Ingår i: ICFPT 2007. - 9781424414710 ; , s. 177-183
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we present a hardware/software co-design based computation platform for online data processing in particle physics experiments. Our goal is to ease and accelerate the development and make it universal and scalable for multiple applications, on the premise of guaranteeing high communicating and processing capabilities. The entire computation network consists of quite a few interconnected compute nodes, each of which has multiple FPGAs to implement specific algorithms for data processing. High-speed communication features including RocketIO multi-gigabit transceiver and Gigabit Ethernet are supported by FPGAs to construct internal and external connections. An embedded Linux operating system is fitted on the PowerPC CPU core inside the Xilinx Virtex-4 FX FPGA. Thus programmers can access hardware resources via device drivers and write application programs to manage the system from the high level. Furthermore measurements have been executed using the development board to investigate both communicating and processing performances of the system. Results show that the computation platform is able to communicate at a UDP/IP data rate of around 400 Mbps per Ethernet link, and the event selection engine could process an event rate of 25%.
  •  
2.
  • Zhu, Wenyao, et al. (författare)
  • Evaluation of Time Series Clustering on Embedded Sensor Platform
  • 2021
  • Ingår i: 2021 24TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 187-191
  • Konferensbidrag (refereegranskat)abstract
    • Clustering is one of the major problems in studying the time series data, while solving this problem on the embedded platform is almost absent because of the limitation of computational resources on the edge. In this paper, two typical clustering algorithms, K-means and Self-Organizing Map (SOM), together with Euclidean distance measurement and dynamic time warping (DTW) are studied to verify their feasibility on an embedded sensor platform. For the given datasets, the models are trained on a computer and moved to an ESP32 microprocessor for inference. It is found that the SOM achieves similar accuracy compared with K-means, while its inference process takes a longer time. The experiment results show that a sample with 300 data points can be clustered into 12 clusters within 40 ms by SOM with the DTW model, while the fastest model can run at around 2 ms using K-means with Euclidean distance model. In other words, it can process the data collected from 40 sensors per second in 680 ms. The clustering function can be scheduled with the real-time data acquisition and transmission tasks. The performance gathered supports that it is feasible to deploy the time series clustering model on the embedded sensor platform.
  •  
3.
  • Becker, Matthias, 1986-, et al. (författare)
  • An adaptive resource provisioning scheme for industrial SDN networks
  • 2019
  • Ingår i: IEEE International Conference on Industrial Informatics (INDIN). - : Institute of Electrical and Electronics Engineers Inc.. - 9781728129273 ; , s. 877-880
  • Konferensbidrag (refereegranskat)abstract
    • Many industrial domains face the challenge of ever growing networks, driven for example by Internet-of-Things and Industry 4.0. This typically comes together with increased network configuration and management efforts. In addition to the increasing network size, these domains typically are subject to adaptive load situations that pose an additional challenge on the network infrastructure.Software defined networking (SDN) is a promising networking paradigm that reduces configuration complexity and management effort in Ethernet networks. In this work, we investigate SDN in context of adaptive scenarios with QoS constraints. Our approach applies monitoring of several thresholds which automatically trigger redistribution of resources via the central SDN controller. This setup leads to an agile system that can dynamically react to load changes while the infrastructure is not overprovisioned. The approach is implemented in a low-level simulation environment where we demonstrate the benefits of the approach using a case study.
  •  
4.
  • Chen, Xiaowen, et al. (författare)
  • Performance analysis of homogeneous on-chip large-scale parallel computing architectures for data-parallel applications
  • 2015
  • Ingår i: Journal of Electrical and Computer Engineering. - : Hindawi Limited. - 2090-0147 .- 2090-0155. ; 2015
  • Tidskriftsartikel (refereegranskat)abstract
    • On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model's analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.
  •  
5.
  • Chen, Xiaowen, et al. (författare)
  • Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property
  • 2013
  • Ingår i: Computers & electrical engineering. - : Elsevier BV. - 0045-7906 .- 1879-0755. ; 39:2, s. 596-612
  • Tidskriftsartikel (refereegranskat)abstract
    • In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.
  •  
6.
  • Li, Cunlu, et al. (författare)
  • RoB-Router : A Reorder Buffer Enabled Low Latency Network-on-Chip Router
  • 2018
  • Ingår i: IEEE Transactions on Parallel and Distributed Systems. - : IEEE COMPUTER SOC. - 1045-9219 .- 1558-2183. ; 29:9, s. 2090-2104
  • Tidskriftsartikel (refereegranskat)abstract
    • Traditional input-queued routers in network-on-chips (NoCs) only have a small number of virtual channels (VCs) and packets in a VC are organized in a fixed order. Such design is susceptible to head-of-line (HoL) blocking as only the packet at the head of a VC can be allocated by the switch allocator. Since switch allocation is the critical pipeline stage in on-chip routers, HoL blocking significantly degrades the performance of NoCs. In this paper, we propose to schedule packets in input buffers utilizing reorder buffer (RoB) techniques. We design VCs as RoBs to allow packets located not at the head of a VC to be allocated before the head packets. RoBs reduce the conflicts in switch allocation and mitigate the HoL blocking and thus improve the NoC performance. However, it is hard to reorder all the units in a VC due to circuit complexity and power overhead. We propose RoB-Router, which leverages elastic RoBs in VCs to only allow a part of a VC to act as RoB. RoB-Router automatically determines the length of RoB in a VC based on the number of buffered flits. This design minimizes the resource while achieving excellent efficiency. Furthermore, we propose two independent methods to improve the performance of RoB-Router. One is to optimize the packet order in input buffers by redesigning VC allocation strategy. The other combines RoB-Router with current most efficient switch allocator TS-Router. We perform evaluations and the results show that our design can achieve 46 and 15.7 percent performance improvement in packet latency under synthetic traffic and traces from PARSEC than TS-Router, and the cost of energy and area is moderate. Additionally, average packet latency reduction by our two improving methods under uniform traffic is 13 and 17 percent respectively.
  •  
7.
  • Liu, Ming, 1982- (författare)
  • Adaptive Computing based on FPGA Run-time Reconfigurability
  • 2011
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In the past two decades, FPGA has been witnessed from its restricted use as glue logic towards real System-on-Chip (SoC) platforms. Profiting from the great development on semiconductor and IC technologies, the programmability of FPGAs enables themselves wide adoption in all kinds of aspects of embedded designs. Modern FPGAs provide the additional capability of being dynamically and partially reconfigured during the system run-time. The run-time reconfigurability enhances FPGA designs from the sole spatial to both spatial and temporal parallelism, providing more design flexibility for advanced system features. Adaptive computing delegates an advanced computing paradigm in which computation tasks and resources are intelligently managed in correspondence with conditional requirements. In this thesis, we investigate adaptive designs on FPGA platforms: We present a comprehensive and practical design framework for adaptive computing based on the FPGA run-time reconfigurability. It concerns several design key issues in different hardware/software layers, specifically hardware architecture, run-time reconfiguration technical support, OS and device drivers, hardware process scheduler, context switching as well as Inter-Process Communications (IPC). Targeting a special application of data acquisition (DAQ) and trigger systems in nuclear and particle physics experiments, we set up the data streaming model and conduct theoretical analysis on the adaptive system. Three application studies are employed to verify the proposed adaptive design framework: The first application demonstrates a peripheral controller adaptable system aiming at general embedded designs. Through dynamically loading/unloading a NOR flash memory controller and an SRAM controller, both flash memory and SRAM accesses may be accomplished with less resource consumption than in traditional static designs. In the second case, two real algorithm processing engines are adaptively time-multiplexed in the same reconfigurable slot for particle recognition computation. Experimental results reveal the reduced on-chip resource requirements, as well as an approximate processing capability of the peer static design. Taking advantage of the FPGA dynamic reconfigurability, we present in the third application a novel on-FPGA interconnection microarchitecture named RouterLess NoC (RL-NoC). RL-NoC employs the novel design concept of Move Logic Not Data (MLND), and significantly distinguishes itself from the existing interconnection architectures such as buses, crossbars or NoCs. It does not rely on routers to deliver packets hop by hop as canonical NoCs do, but buffers data packets in virtual channels and brings various nodes using run-time reconfiguration to produce or consume them. In comparison with canonical packet-switching NoCs, the routerless architecture features lower design complexity, less resource consumption, higher work frequency, more efficient power dissipation as well as comparable or even higher packet delivery efficiency. It is regarded as a promising interconnection approach in some design scenarios on FPGAs, especially for light-weight applications.
  •  
8.
  • Liu, Ming, et al. (författare)
  • System-on-an-FPGA Design for Real-time Particle Track Recognition and Reconstruction in Physics Experiments
  • 2008
  • Ingår i: 11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS. - LOS ALAMITOS : IEEE COMPUTER SOC. ; , s. 599-605
  • Konferensbidrag (refereegranskat)abstract
    • In particle physics experiments, the momenta of charged particles are studied by observing their deflection in a magnetic field. Dedicated detectors measure the particle tracks and complex algorithms are required for track recognition and reconstruction. This CPU-intensive task is usually implemented as off-line software running on PC clusters. In this paper we present a system-on-chip design for the track recognition and reconstruction based on modern FPGA technologies. The basic principle of the algorithm is polled from software into the FPGA fabric. The fundamental architecture of the tracking processor is described in detail. Working as processing engines in compute nodes, the tracking processor contributes to recognize potential track candidates in real-time and promotes the selection efficiency of the data acquisition and trigger system. Our design study shows that the tracking module can be integrated in a single Xilinx Virtex-4 FX60 FPGA. The processing capability of the design is about 16.7K sub-events per second per module with our experimental setup, which achieves 20 times speedup compared to the software implementation.
  •  
9.
  • Long, Y. -C, et al. (författare)
  • Analysis and Evaluation of Delay Bounds for Multiplexing Models Based on Network Calculus
  • 2018
  • Ingår i: Tien Tzu Hsueh Pao. - : Chinese Institute of Electronics. - 0372-2112. ; 46:8, s. 1815-1821
  • Tidskriftsartikel (refereegranskat)abstract
    • In resource-sharing communication media such as buses, crossbars and networks, multiplexings are inevitable. While sending packets over a multiplexing node, the worst-case delay bound can be computed using network calculus. The tightness of such delay bound remains an open problem. This paper studies different analysis approaches for multiplexing models, from the single multiplexing node to multi-flow-multi-node model, applying two traffic arrival models, and two service properties when getting equivalent service curves. We analyze per-flow delay bounds with different models, then empirically evaluate the tightness of the delay bounds. Our results show the quality of different analysis models, and how influential each parameter is to tightness.
  •  
10.
  • Lu, Zhonghai, et al. (författare)
  • A power efficient flit-admission scheme for wormhole-switched networks on chip
  • 2005
  • Ingår i: WMSCI 2005. - 9789806560567 ; , s. 25-30
  • Konferensbidrag (refereegranskat)abstract
    • Reducing power consumption is a main challenge when adopting a network as a global on-chip communication interconnect since the reduction in power dissipation should not at the expense of degrading the system performance. We investigate power in a wormhole-switched network with focus on the impact of flit-admission schemes, i.e., when and how the flits of packets are admitted into the network We have proposed a novel flit-admission scheme that shows significant shrink of the switch complexity while maintaining equivalent network performance. This paper investigates its influence in network power involving both switches and links. We conduct experiments on a 2D mesh network. The results show that our flit-admission scheme achieves significant power and area reduction without performance penalty. To our knowledge, our work is the first study of power dissipation on flit admission schemes.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 66

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy